Inference with minimal Gibbs free energy in information field theory 
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Non-linear and non-Gaussian signal inference problems are difficult to tackle. Renormalization 
techniques permit us to construct good estimators for the posterior signal mean within information 
field theory (IFT), but the approximations and assumptions made are not very obvious. Here we 
introduce the simple concept of minimal Gibbs free energy to IFT, and show that previous renor- 
malization results emerge naturally. They can be understood as being the Gaussian approximation 
to the full posterior probability, which has maximal cross information with it. We derive optimized 
estimators for three applications, to illustrate the usage of the framework: (i) reconstruction of a 
log-normal signal from Poissonian data with background counts and point spread function, as it 
is needed for gamma ray astronomy and for cosmography using photometric galaxy redshifts, (ii) 
inference of a Gaussian signal with unknown spectrum and (iii) inference of a Poissonian log-normal 
signal with unknown spectrum, the combination of (i) and (ii). Finally we explain how Gaussian 
knowledge states constructed by the minimal Gibbs free energy principle at different temperatures 
can be combined into a more accurate surrogate of the non-Gaussian posterior. 



I. INTRODUCTION 
A. Abstract inference problem 

Measurements provide information on the signals we 
are interested in, encoded in the delivered data. How 
can this information be best retrieved? Is there a generic 
and simple principle from which optimal data analysis 
strategies derive? Can an information energy be con- 
structed which - if minimized - provides us with the cor- 
rect knowledge state given the data and prior informa- 
tion? And if this exists, how can this information ground 
state be found at least approximatively? 

An information energy, to be minimized, would be very 
useful to have, since many of the existing minimization 
techniques, analytical and numerical, can then be ap- 
plied to it. A number of such functions to be extremized 
to solve inference problems were proposed in the liter- 
ature, like the likelihood, the posterior, or the entropy. 
The likelihood is the probability that the data has re- 
sulted from some signal. The posterior is the reverse, it 
is the probability that given the data some signal was the 
origin of it. Extremizing either of them certainly makes 
sense, but often ignores the presence of slightly less prob- 
able, but much more numerous possibilities in the signal 
phase space. Those have a much larger entropy and are 
therefore favored by maximum entropy methods. How- 
ever, maximum entropy alone can not be the inference 
determining criterion, since it favors states of complete 
lack of knowledge, irrespective of the data. Thus some 
counteracting energy is required which provides the right 
amount of force to the inference solution. Here, we ar- 
gue that the ideal information energy is provided by the 
Gibbs free energy, which combines both maximum en- 
tropy and maximum a posteriori (MAP) principles. 

The Gibbs free energy has to be regarded as a func- 
tional over the space of possible probability density func- 
tions (PDF) of the signal given the data. The result of 
the minimization is therefore a PDF itself, and not a sin- 



gle signal estimate. Minimizing the Gibbs free energy 
maximizes the entropy within the constraints given by 
the internal energy. The latter is understood as the av- 
erage of the negative logarithm of the joint probability 
function of signal and data weighted with the PDF. 

The usage of thermodynamical concepts for inference 
problems is not new, see e.g. [11, What is new here, 
is that we develop this for signals which are fields, spa- 
tially distributed quantities with an infinite number of 
degrees of freedom, while using an approximate Gaus- 
sian ansatz for the PDF to be inferred. We thereby con- 
nect information field theory (IFT) [sl-fl^ , as a statistical 
field theory dealing with a huge number of microscopic 
degrees of freedom, to thermodynamics, as a means to 
generate simplified, but macroscopic descriptions of our 
knowledge. Thereby we find that former IFT results ob- 
tained with complex renormalization schemes in [ll| 
can easily be reproduced, and even be extended to more 
complicated measurement situations. 

In the remainder of Sect. U we briefly introduce to 
IFT, MAP, and Maximum Entropy. This motivates the 
minimal Gibbs free energy principle, which we formally 
derive in Sect. [Ill and show its equivalence to maximal 
cross information. The application of this principle to 
optimize approximations of the posterior of concrete in- 
ference problems is provided in Sect. IIIII There, the 
log-normal Poisson problem (Sect. IIII Al) and the prob- 
lem to reconstruct without known signal power spectrum 
(Sect. IIII Bp . as well as their combination (Sect. IIII Cp 
are addressed. Finally, we show how approximate poste- 
riors obtained at different temperatures can be combined 
into a better posterior surrogate in Sect. IIVI before we 
conclude Sect. |Vl 



B. Information field theory 

Information theory describes knowledge states with 
probabilities. If ft is the complete set of possibilities. 
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and A C is a subset, then P{A) G [0, 1] describes the 
plausibihty of A being the case, with P{A) = 1 denot- 
ing A being assumed to be sure, P{A) — denoting A 
being (assumed to be) impossible, and < P{A) < 1 
describing uncertainty about the truth of A. Obviously 
P{n) = 1 and F(0) = 0. The usual rules of proba- 
bility theory apply, and generalize the binary logic of 
Aristotle to different degrees of certainty or uncertainty 
[T^ Hsj . In case the set of possibilities is a continuum, 
it makes sense to introduce a PDF Viip) over fi, so that 
P{A) = ^^dtpV{ip). Each possible state tp can be a 
multi-component vector, containing all aspects of reality 
which are in the focus of our inference problem. 

We might be interested in a sub-aspect of ^ which 
we call our signal s = s(V')- The induced signal PDF 
is retrieved from a functional or path integral over all 
the phase spaces of the possibilities of ij} via P{s) — 
J Vil}V{il}) 5{s — s{ip)). If s is a field, a function over 
a physical space V, then s = {sx)xev might be a vector 
in the Hilbert space of all L^-integrable functions over 
V and V{s) is then a probability density functional. In- 
formation theory for s becomes IFT, which is a statistical 
field theory. 

Inference on the signal s from data d is done from the 
posterior probability V{s\d), which can be constructed 
from the joint PDF of signal and data Vid, s) via 



rid) 



-l3H{d,s) 



(1) 



where V{d,s) = j^^Vi^V{d\i;) 5{s - s{ip))V{ip) = 
'P{d\s)V{s) and = / VsVid.s). The second equal- 
ity in ([1} is just a renaming of the numerator and denomi- 
nator of the first fraction, which highlights the connection 
to statistical mechanics. Thus we define the information 
Hamiltonian 



H{d,s)^ ~\ogV {d,s), 



(2) 



the partition function including a moment generating 
source term J 



(3) 



and the inverse temperature /3 = 1/T as usual in statis- 
tical mechanics. Here is the transposed and complex 
conjugated signal vector s, leading to a scalar product 
j^s = Jy dx jxSx- The ad-hoc notion of temperature is 
as in standard simulated annealing practice. It permits 
to narrow (for T < 1) or widen (for T > 1) the explored 
phase space region with respect to the one of the joint 
PDF and therefore is a useful auxiliary parameter. We 
show in Sect. Ill Al that the well known thermodynamical 
equipartition theorem holds: 
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{His, d))(,|,)-i7(m, d)«-7VdgfT. 



(4) 



where A^dgf is the number of degrees of freedom and m 
is the mean signal field as defined below in ([5]), which 



defines the ground state energy. This relation can e.g. be 
used to check the correctness of an implementation of a 
signal phase-space sampling algorithm. 



C. Maximum a posteriori 

The first guess for a suitable energy to be minimized 
to obtain the information state might be the Hamilto- 
nian. Minimizing the Hamiltonian with respect to s, 
while keeping d at their observed values, is equivalent 
to maximizing the joint probability 'P(d, s) and also the 
posterior Vis\d). The classical field emerging from this is 
called the MAP signal reconstruction in signal process- 
ing. For a detailed discussion of the usage of the MAP 
principle in IFT see . The MAP field is often a very 
good approximation of the mean field 



VssVisld), 



(5) 



which is the optimal estimator of the signal in a statistical 
error norm sense: 



argmm^ 



dx isx 



(6) 



{s\d) 



The MAP estimator on the other hand can be shown 
to optimize the statistical L*^ norm^, the result of which 
may strongly deviate from the mean to, if the posterior 
is highly asymmetric around its maximum. Thus we can 
regard the MAP estimator as a good reference point, but 
not as the solution we are seeking in general. It is, how- 
ever, accurate (in the error norm sense) in case the 
posterior around its maximum is close to a Gaussian. In 
this case, the MAP field can easily be augmented with 
some uncertainty information from the Hessian of the 
Hamiltonian 

5'^Hid,s) 



n 



(7) 



6s (5st 

as an approximation of the two point function of the sig- 
nal uncertainty 

D = (^{s — to) (s — to)^^ 
Thus we set D ^Ti.^^ in 



'(s\d) 



(8) 



Vis\d) ^ris\d) =^gis-m,D), 
where we introduced the Gaussian 



gi^,D) = 



1 



\2ttD\2 



(9) 



(10) 



Unfortunately, the MAP estimator can perform subop- 
timally in cases where the Gaussian approximation does 
not hold, see e.g. [lTj |. 



^ The L'^ norm measures the amount of exact agreement via 
ll/llo = limg^o ^ Jdxd{f^{x) — £^), with 9 denoting the Heavi- 
sidc function. 
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D. Maximum Entropy 

1. Image entropy 

Another quantity often extremized in image recon- 
struction problems is the so-called image entropy (iE) 
[l3 - |25j . In classical maximum (image) entropy (MiE 
here, usually ME) methods the iE is defined for a strictly 
positive signal via 



S'iE(s) 



dxSx\og{sx/sx) = -shog{s/s), (11) 



where Sx is the reference image, which is used to model 
some prior information. In the second equality we have 
defined the component-wise application of functions on 
fields, e.g. (/(s))^ = f{sx), which we use throughout 
this work. 

We note, that the iE is actually not a physical entropy. 
Usually its usage is argued for by ad hoc assumptions on 
the distribution entropy of photon packages in the image 
plane, rather than being a well motivated description of 
the signal prior knowledge (or lack thereof). In the fol- 
lowing we will reveal the implicitly assumed prior of MiE 
methods. 

The data enter the MiE method in form of an image 
energy, which is ideally chosen to be the negative log- 
likelihood. 



E{d\s)^~\og{Pid\s)), 



(12) 



in order to ensure the best imprint of the data on the 
reconstruction. The entropy is then maximized with the 
energy constraint given by minimizing 



E,E{d,s)=E{d\s)-TSiE{s) 



(13) 



with respect to s. Here T is some adjustable 
temperature-like parameter, permitting us to choose the 
relative weight of image entropy and image energy. Low 
temperature means that the MiE map follows the data 
closely, high temperature that the map space wants to be 
more uniformly occupied by the signal reconstruction. 

The prior information on the signal, 'P{s), does not 
enter the MiE formalism explicitly. Actually, an implicit 
prior can be identified, assuming that MiE is actually 
a MAP principle. In that case the implicitly assumed 
Hamiltonian is H{^{d,s) = ii'iE(d, s), where = denotes 
equality up to an irrelevant, since s-independent, additive 
constant, and we find 



Vi^{s) oc e 



TSiE(s) 



OC 



n 



(14) 



This is not a general prior, but a very specific PDF. Al- 
though there is some flexibility to adopt its functional 
form by choosing s, T, and the image space (pixel space, 
Fourier space, wavelet space, etc.) in which (fTTI) holds, 
'PiE(s) can not be regarded as being generic. The MiE 



prior strongly suppresses large values in the MiE map. If 
a data feature can be either explained by a single map 
pixel exhibiting a peak value or by several pixels dividing 
that value among themselves, MiE will usually prefer the 
second option, leading to blurred reconstructed images. 

We conclude, that the term maximum entropy com- 
monly used in image reconstruction is very misleading. A 
more accurate term would be minimal dynamical range, 
since the implicitly assumed prior states that pixels car- 
rying larger than average signal Sx are extremely unlikely. 



2. Physical entropy 

A physical entropy should measure the distribution 
spread of a PDF using a phase space integral over its 
phase space. In fact, the latter is given by the Boltzmann 
entropy as given by the negative Shannon information. 



VsV(s\d) \ogV{s\d), 



(15) 



which is a functional of the signal posterior, 5*3 = 
SB[P{s\d)], and not of the signal map. Inserting ^ 
yields 

Sb = (i?(d, s))(^|,) + log Zi(d, G) = U-F, (16) 

where we introduced the internal energy U — 
{H{d, s))(s|^) and the Helmholtz free energy F — Fi{d, 0) 
with 



Ff){d, J) 



^ log Zfiid, J). 



(17) 



The fully J-dependent Helmholtz free energy provides 
the field expectation value via 



{s\d) 



dFp(d, J) 



dJ 



(18) 



/3=1,J=0 



The entropy is also given in terms of the free energy via 



Sv 



dFpid, J) 



dl3 



(19) 



/3 = 1,J=0 



The entropy as well as the free energy are functionals of 
the posterior and not of the signal. Maximizing or min- 
imizing them does not provide a signal estimator, but 
singles out a PDF. If we restrict the space of PDFs to 
the ones we can handle analytically, namely Gaussians 
as given in ^ and (ITUl) . we might obtain a suitable ap- 
proximation scheme to the full field theoretical inference 
problem. 

Maximizing the entropy alone does not lead to a suit- 
able algorithm, since the maximal entropy state is that of 
complete lack of knowledge, with a uniform probability 
for every signal possibility. The internal energy, however, 
favors knowledge states close to the posterior maximum 
and would return the MAP solution if extremized alone. 
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Thus the right combination of entropy and internal en- 
ergy is to be extremized. We would expect a free energy 
of the form U — T Sb to be this function, in analogy to 
the energy ([T^ used in MiE methods. Thermodynamics 
teaches us that the Gibbs free energy is the quantity to 
be minimized (which is identical to the Helmholtz free 
energy in case J — 0). Since we are going to calculate 
this for an approximation of the real PDF, it is necessary 
to go through the derivation in order to make sure we do 
this in the right fashion and understand all implications. 



II. THERMODYNAMICAL INFERENCE 



In this case, the partition function can be calculated 
explicitly and reads 



+ jtm - l3H{m, d) I. 





27r 




Z^id, J) = 


J" 


exp 1^ 


2/3 


With standard thermodynamics 
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Zp{d, J) 


J=0 



^dgt 



T + H{m, d) 



A. Tempered Posterior 



(22) 

where iV^gf is the dimension of the signal vector. This 
result is the re-phrased equipartition theorem (|4]) from 
classical thermodynamics and further motivates the no- 
tion of temperature in IFT. 



In order to take full advantage of the existing ther- 
modynamical machinery we want to construct the Gibbs 
free energy for information problems. To this end, we 
introduce a temperature and a source function into the 
PDF of the signal posterior as suggested by the definition 
of the partition function ([3]) by defining 

' Zpid^J) SVs'(V{d,s')e-^'^' f 

(20) 

With the temperature we can broaden (for T > 1) or nar- 
row (for T < 1) the posterior. Three temperature values 
are of special importance, namely T = 0, which modi- 
fies the PDF into a delta peak located at the posterior 
maximum, T — 1, which returns the original posterior, 
and T ~ oo, leading to the maximum entropy state of an 
uniform PDF. The source function J permits us to shift 
the mean of the PDF to any possible signal configuration 
m = m((i, T, J). 

The modified PDF will be approximated by a Gaussian 
with identical mean and variance: 

■P(s|d, T, J) « g{s ^m,D):^ V{s\m, D), (21) 

where also D = D{d, T, J). 

We will see, that the width D of this Gaussian ap- 
proximation of the PDF increases with increasing tem- 
perature. At low temperature (T <C 1) the center of 
the PDF is probed and modeled, while at large tempera- 
tures (T 3> 1) the focus is on its asymptotic tails. Since 
the Gaussian in (|2ip is an approximation, it is not even 
guaranteed that T — 1 provides the best recipe for sig- 
nal reconstruction. E.g. in [9^ a case is shown, where 
signal reconstruction using T = 0.5 slightly outperforms 
both, T = and T = 1. Since working at multiple tem- 
peratures can reveal different aspects of the same non- 
Gaussian PDF (i.e. its central or asymptotic behavior), 
the question appears how the differently retrieved Gaus- 
sian approximations can be combined into a single and 
more accurate representation of the original PDF. This 
will be addressed in Sect. IIVI For the moment we ap- 
proximate our posterior by a single Gaussian as in (I21[) . 



B. Internal, Helmholtz and Gibbs energy 

The next step is to calculate the Helmholtz free energy. 
In case it can be calculated explicitly from (ITTl) . the in- 
ference problem is basically solved, since any (connected) 
moment of the signal posterior can directly be calculated 
from it by taking derivatives with respect to the moment 
generating function J, e.g. see ((T5]). This will, however, 
only be the case for a very restricted class of Hamilto- 
nians, like the free ones, which are only quadratic in s. 
In the more interesting case the Helmholtz free energy 
can not be calculated explicitly, we can use the thermo- 
dynamical relation of the Helmholtz free energy with the 
internal energy and entropy. 

First, we note that the internal energy of the modified 
posterior is given by 

U{d,T,J) = {H{s,d))^^^,^^j^ 

« (i/(s,d))(,|,„,^) =C/(d,™,i^), (23) 

where m and D are still functions of d, T, and J. The 
average in the second line has to be understood to be 
performed over a Gaussian with mean m and dispersion 
D- {f{s))is\m,D) =SVs f{s) g{s - m, D). 

Further, we need to calculate the entropy for the mod- 
ified PDF, which for a Gaussian depends only on D: 

SB[Q{s-m,D)] = iTr(l + log(2^i?)) =5B(i?). (24) 

For the full modified posterior, (|20)) . the entropy is cal- 
culated via ([T5|) to be 

Sb^P {U + Jhn - F) , (25) 

where m = m{d,T,J) = {s)^s\dT j)^ ^ given by (j23p . 
and F by (fT7|) . Solving (|25|) for the Helmholtz free energy 
yields 

Fp{d,J) = U-TSB + J^m. (26) 

This expresses the Helmholtz free energy in terms of in- 
ternal energy and entropy. Unfortunately, this expression 
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contains the term J^m, where m depends on J imphcitly 
through ^TEl . In order to get rid of this term, we Leg- 
endre transform with respect to J and thereby use (fT8| . 
which provides us with the Gibbs free energy 

r 771 

Gp{d,m)^F~J^— = U~TSB. (27) 

The Gibbs energy depends solely on m and not on J. It 
can be constructed approximatively, in case approxima- 
tions of the internal energy and the entropy are available. 
For our Gaussian approximation of the modified poste- 
rior we therefore write 



Gfi{d,m,D) = U{d,m,D) -T Sb{D). 



(28) 



We know from thermodynamics that the minimum of the 
Gibbs free energy with respect to variations in m provides 
the expectation value {s),g\^\ of our field: 



5G{d,m,D) 



5m 



= 

™={s>(s|d) 



(29) 



Thus, the Gibbs energy is the information energy we were 
looking for in the introduction. 

Minimizing the Gibbs free energy for a Gaussian PDF 
with respect to m yields 



0= /p,si^(.,.)Ml^ 
dm J dm 

= (0i/™(d,0))(^l^), 
with H„i{d, (p) ^ H{d,m + (f)), which implies 

_ (si?(d,s))(,,„_^) _ (si^(d,s))(s|™,D) 



(^^('^>'S))(.|m,D) 



U{m,D) 



(30) 



(31) 



The optimal map is therefore the first signal moment of 
the full Hamiltonian weighted with the approximating 
Gaussian. 

Thermodynamics teaches us further that the propaga- 
tor, the uncertainty dispersion of the field, is provided 
by the second derivative of the Gibbs free energy around 
this location, thanks to the well known relation 



6^G 
5m 6m^ 



5^F 



--is). 



SJSJ'' 



13 D. (32) 



j=o 



This relation closes the set of equations by providing D 
Evaluating p2p with our approximate Gibbs energy 
and using ([^ yields 



TD-' = 



d^G 



5m 5m^ 



= -D-'^U{d,m,D) 



'"={«>(= Id) 
+ {(t>4>'^H^{d,^)) 



WD] 



D- 



Thus the propagator is the second moment of the Gaus- 
sian weighted Hamiltonian, 



D = 



(./)(/.tiJ„,(0)) 



WD) 



U{d,m,D) 



(33) 



This equation seems to suggest that the propagator eval- 
uated at higher temperature is narrower, since T appears 
in the denominator. However, the opposite is the case 
due to the presence of D in all terms, as a test with a 
free Hamiltonian will show in 



C. Cross information 

The Gibbs free energy at T = 1 is directly related to 
the cross information between the posterior and its Gaus- 
sian approximation. The cross information (or negative 
relative entropy) of a PDF V with respect to another 
one V is measured by the so called KuUback-Leibler di- 
vergence [26j : 



IKL 



-r,r] 



VsP{s\d) log 



V{s\d) 



(34) 



The Kullback-Leibler divergence characterizes the dis- 
tance between a surrogate and target PDF in an informa- 
tion theoretical sense. It is an asymmetric distance mea- 
sure, refiecting that the roles of the two involved PDF 
differ. The equivalence of Gibbs free energy and cross in- 
formation with respect to inference problems can easily 
be seen: 



G{m,D) 



{H{d, s)+log(^(s-m,i?)))(^|,„_^) 

f G{s - TO, D] 



J Vs Q{s — TO, D) log 



Vs Q{s — TO, D) log 

dKL['P,r]. 



P{.s,d) 
Q{s - TO, D) 



(35) 



In the second last step we added the term \ogV{d), which 
is irrelevant here, since to- and Z3-independent, and in 
the last step we introduced the Kullback-Leibler diver- 
gence between posterior 'P{s\d) and its Gaussian sur- 
rogate 'P{s\d) = Q{s — m,D). Minimal Gibbs free en- 
ergy therefore seems to corresponds to minimal Kullback- 
Leibler divergence, and therefore to maximal cross infor- 
mation of the surrogate with the exact posterior. 

However, we have only minimized the Gibbs free en- 
ergy so far with respect to to, the mean field, degrees of 
freedom of our Gaussian, not with respect to the ones pa- 
rameterizing the uncertainty dispersion D. We have de- 
termined this using the thermodynamical relation (j29p . 
If we want that our surrogate PDF has maximal cross in- 
formation with the posterior with respect to all degrees 
of freedom of our Gaussian, we also have to minimizing 
the Gibbs energy with respect to D. A short calculation 
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shows that this actually yields a result which is equivalent 
to the thermodynamical relation ([5^ : 



D- 



{^(l)^H„,{d, 4>)) 



WD) 



D {U{m, D) + T) 



D- 



from which also p3p follows. Thus, we can regard both, 
the map m and its uncertainty covariance D, as param- 
eters for which the Gibbs energy should be minimized. 
We will refer to this as the maximal cross information 
principle. 

We further note that the maximal cross information 
principle also holds if the Gaussian is replaced by some 
other model function, G['P(s|d)] = A]<^i^[P,V], a property 
we will use later in Sect. IIVI 

Note, that the minimal cross information and the ther- 
modynamical relations yield exactly the same results for 
m and D only if G{m,D) is calculated exactly. In case 
there are approximations involved, the resulting algo- 
rithms differ slightly, and this difference can be used to 
monitor the impact of the approximation made. In the 
following, we use the minimal cross information principle 
for our examples. 



We assume that the interaction coefficients A'^^.x^ are 
symmetric with respect to index permutations, since they 
resulted from a Taylor-Frechet expansion. 

The internal energy can then be calculated via the 
Wick theorem and the fact that all odd moments of (j) 
vanish: 



^K^) - E^Ji (^^"^('^'■•■'^)>,„^, (39) 

n 

oo _! ^ ^ ^ 

n k 

eg) TO ® • • • m) 



n=0 



E 

n,k=0 



2"n!fc! 



Here, we defined the symmetrized tensor product (T 

'^')xi...x„ = S7i-es„ •T'x„(^^i,...a;„(„) by aver- 

aging over all permutations in Sm the symmetric group. 

Having obtained the internal energy with p9p . and 
entropy with (j25l) approximatively, we can construct the 
Gibbs free energy according to which we use for our 
inference. 



D. Calculating the internal energy 



E. Minimizing 



In order to calculate the approximative Gibbs energy, 
we need to estimate the internal energy, for which we 
have to specify the exact Hamiltonian. We assume that 
it can be Taylor-Frechet expanded as 



oo 



(36) 



A(")(s,...s) 



where repeated coordinates are thought to be integrated 
or summed over. The approximative internal energy is 
then 



In order to get our optimal Gaussian approximation 
to the posterior, we have to minimize G^(to, _D) with 
respect to m and D. Minimizing for to is equivalent to 
minimizing the internal energy, since the entropy does 
not depend on m. This yields 



= 



SU{m,D 




U{m,D) = U[P{s\d)]= VsH{d,s)V{s\d) 



{s\m,D) 



oo 

i:i(A<".,.,....: 



(37) 



The Gaussian n-point correlation functions in this equa- 
tion can actually be calculated analytically. For this, we 
again use the shifted field (f) = s — m, which has the 
Hamiltonian 



oo ^ 

= J2 ^A(^)(0,...0), with 

n=0 
oo 

A(^)(0,...0) = ^ -A(«+^-)(0,...0,TO, 



(38) 



A:=0 



which has to be solved for to for any given D. The prop- 
agator derives from (1321) or from 



(41) 



SG{m,D) 



SD 



TD-^ 



E 



A(2"+fc+2) (., ., D eg) . . . £) (g) TO (g) • • • m) 



n,k=0 



2"n!fc! 



which also depends on m. Thus, (l40l) and (|4T|) have to 
be solved simultaneously. 

A simple example should be in order. The simplest 
case is that of the original Hamiltonian being quadratic. 
The approximated one should then match this exactly. A 
quadratic or free Hamiltonian is equivalent to a Gaussian 
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posterior, V{s\d) = Q{s — m*, D*). We get 

H{d,s) = -{s — m^)^D^^{s~Tn^:) 

^ AWs^ + ^Ai^Js^Sj, with (42) 
= -D-^m^, and 

Inserting this into PO)) and (HTI) yields 

= A'-^\-)+A'-^Hm,-) = D;\m~m,) 

m = m*, (43) 
= A(2) (.,.) = /?,-! 

^ D^TD,, (44) 

which indeed recovers the original coefficients for T — \, 
and a narrower or wider uncertainty dispersion for T < 1 
or r > 1, respectively. In the following, we will see that 
also in case of interacting Hamiltonians the minimal free 
energy principle provides the correct results. We show 
this by reproducing (and extending) signal estimators de- 
rived previously in IFT using renormalization techniques. 

III. APPLICATION EXAMPLES 
A. Poissonian log-normal data 

1. Separable case 



The corresponding Hamiltonian was shown in 'P] to be 
H{d,s) ^ ^s^S-'^s-d^bs + K^'e'". (48) 

Reconstruction methods for this data model were devel- 
oped by [i[3ii3- 

The internal energy of our Gaussian approximation can 
be calculated analytically, 

U{m,D) = ^m^S-^m+^TiiDS-^)-d'ibm 

where D denotes the vector of diagonal elements of D. 

Minimizing G{m,D) = U{m,D) - T Sb{D) with re- 
spect to m and D yields 

TO = S b \ d — K ,bC;), and 

D = T[s-'+b'n^^,^y\ (50) 

respectively. Here we have defined nt = Kexp{bt) and 
denote a diagonal matrix by putting a hat onto a vector 
of its diagonal elements {X)xy = ^xy This result is 
identical with the one found in Q using a lengthy renor- 
malization calculation. There it was found by numerical 
experiment, that using T = 0.5 in (|50p seems to produce 
slightly better results than T = and T = 1. 



Many inference problems have to deal with Poissonian 
noise, like X-ray and 7-ray astronomy as well as recon- 
struction of the cosmic large-scale structure from galaxy 
counts. Let us assume that the mean count rate A of 
photons or galaxies is proportional to an exponentiated 
Gaussian random field s with covariance S = (ss^)^^^ 
according to 

A(s) = Ke''^ (45) 

Here, k is the expected counts for s — 0, which may de- 
pend on the spatial position. The scalar b permits us to 
change conveniently the strength of the non-linearity of 
the problem without changing the signal statistics. This 
log-normal model for the cosmic large-scale structures as 
an approximative description is actual ly sup ported ob- 
servationally [23, and theoretically [2^-[3J| . 

As a starting point, we assume a local response, so that 
the Poisson statistics for the actual counts dx at location 
X are 

P{dx\Xx) = ^e-^^, (46) 

Ctx ■ 

and the full likelihood is well separable into local ones: 
P{d\s) = Y[P{dx\Xx{sx))- (47) 

X 



2. Entangled case 

So far, we assumed that the response provides a one to 
one correspondence between locations in signal and data 
space. However, for most measurements this is not ex- 
actly true. X- and 7-ray telescopes typically exhibit point 
spread functions, which map a single signal space location 
onto several detectors, of which each detects events com- 
ing from several indistinguishable directions. Also galaxy 
redshifts do not provide accurate distance information, 
since redshift distortions and measurement errors lead to 
effective point spread functions. 

In the following, we generalize to the case of a known 
and fixed, but non-local measurement response. Fixed 
means, that the response is independent of the signal. 
This excludes the treatment of galaxy redshift distortions 
with this case (e.g. see [s^ for this), but still includes 
photometric redshift errors of galaxy catalogs as well as 
X- and 7-ray telescope data. Such problems have been 
approached in the past via the MAP principle (S^-H^. 

The point spread function is modeled by the response 
matrix R = (Rix) which describes how emissivity at lo- 
cation X is expected to be observed in data channel i. 
The expected count rate is now 

A(s)=i?e''^ (51) 
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and the likelihood does not separate any more with re- 
spect to X 



P{d\s)^\[P{d,\Us)), 



(52) 



since \i{s) entangles the signal from several locations, 
whereas in (|T7)) it depends only on the local signal value. 
We recover the former case for a diagonal response Rix — 
Kx Six ■ The resulting Hamiltonian 

H{s\d) ^ ^s^S-Ks + l^Re'" -dUogiRe''') (53) 

reduces to (05]) for R being diagonal. 

The internal energy of our surrogate Gaussian 
V{s\d) = G{s — TO, D) is then 

U{m,D) = im^S'-iTO+iTr(DS'-i) + lti?e'""+^^ 



d, /"p</)log(i?Ie''(™+"*)) a(</),I?).(54) 



This integral li can not be calculated in closed from due 
to the logarithm in the integrand. We expand the loga- 
rithm around i?Je™, since we will see that this recovers 
the result of the separable case most easily for R being 
diagonal. We get 



L = log(i?Je'"" 



log 



Re 



t„br 



(55) 



i4>\D) 



In case R is diagonal, the first term reduces to fern -|- 
logi?i, the second vanishes as (log(exp(6 (/))))^^l^j — 
(^^){cli\D) ~ ^^'^ recover the Hamiltonian of the 
separable case. 

In the general case of an entangling response we Taylor 
expand the logarithm of the second term 



I, = log (i?le^™) 

((• 



^ n 

n— 1 



, with (56) 



or ri(x) — 



Re 



Jdx' i?,(a;')e*'"(^')' 



We note that rjl — j dxvix = 1 by construction. 

The expansion coefficients 11^ „ can be worked out one 
by one. We provide here the first few, namely 

II. 1 = r\e^''^-l^ 
11,3 = rixTiyVi:, exp I y ^ Dab 

a.,h^{x.y.,z^ 



Zr^xTiy exp [ — ^ Dab 



a,6G{2;,y} 



(57) 



These coefficients stay small \i iP' D <C 1, which means 
that the expansion can be truncated if the signal is known 
within a few ten percent or if non-Gaussianity is small. 
Large uncertainties in the signal strength do not nec- 
essarily lead to large coefficients if they are located at 
positions without instrumental sensitivity {Rix small) or 
much lower expected count rates {lUx small). In both 
cases mostly prior information and extrapolation from 
regions with more informative data will determine the 
solution at such locations. 

In case some of these coefficients are large, substan- 
tial signal uncertainty at the locations to which they are 
sensitive must be present. In this case an accurate recon- 
struction for these locations can not be expected. Thus, 
if we simplify the Hamiltonian by dropping such terms, 
even if they are relatively large, the quality of the recon- 
struction will not suffer too much since only regions are 
affected, which are poorly constrained by the data any- 
way. Therefore, truncating the expansion should already 
provide usable algorithms. 



3. Zeroth order solution 

To zeroth order, we ignore all IIi„-terms and find for 
the approximative free energy 



G{m,D) « ^m'lS-^m+^TriDS-^) 



T 



Tr(l-hlog(27r£l)). 



(58) 



Minimizing this with respect to m and D yields 

d,. 



TO 



-,b m 



62 



b^ D 



= Sh {Sr - k'(to + bD/2)^ , and 
D = T {S^^ +b^n{m + hD/2)^ \ with 
n\t) = ^ i?,e''*. (59) 

i 

This is very similar to (1501) and reduces to it for a diagonal 
response. 

4- First order correction 

First order corrections are included by keeping the 
Ilii-term in the approximative free energy, but ignoring 
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higher terms. The resulting equations are 

m = Sb (^d,(l + rle^^^ri- K"im + bD/2)^ 
D = T (^S^^ +b^K"{m + bD/2)^ \ with 

."(t) =E^.e-(l + ^). (60) 

This is a slight modification with respect to ([55]) in two 
aspects. The map changes a bit, but the sign of the 
changes depends on the details of the point spread func- 
tion, since there are two new terms of similar order, but 
with opposite signs. The uncertainty variance is reduced, 
since the term added to the inverse propagator is always 
positive. 



are usually not interested in the background proper- 
ties, we marginalize over it. This is especially simple 
in the Gaussian approximation of our joint posterior 
P{s'\d) « Gis' - m', D'), with s' = {s, /), m' - (,5, /), 

TO« J Vs' s g{s' ~m',D') = S, and (62) 

D^y « J Vs' {s - S)., {s - S)y g{s' - m', D') = D'^y. 

Although this does not look too different from the for- 
mula for the case without background, the effect of the 
background entered through the joint covariance matrix 
D' , which mixes the contribution from the signal and 
background events appropriately. 

B. Reconstruction without spectral knowledge 



5. Observation with background 

The observation may suffer from a background, events 
in data space, which do not contribute to our signal 
knowledge. For example 7-ray astronomy has to suppress 
cosmic ray events as much as possible, since charged par- 
ticles do not point back to the same sources as neutral 
photons due to cosmic magnetic fields. Fortunately, cos- 
mic rays have different signatures in data space due to the 
differences in hadronic and electromagnetic interactions. 
However, not for all measured events is the distinction 
clearly cut and we have to use prior knowledge to sup- 
press the background events. 

Therefore we should extend our formalism to also take 
such unwanted backgrounds into account. Actually a 
reinterpretation of the above formula will do. We ex- 
tend our signal space by the quantity / determining the 
logarithm of the background count rate, s ^ s' = (s, /). 
fz might be a field over the same physical space as Sx , or 
just a single number as a total isotropic cosmic ray flux. 
In any case, the x— and z— coordinates are regarded to be 
over different spaces, or distinct areas of the joint space 
over which / and s live. The joint covariance reads 

due to the independence of signal and background. Here, 
F = is the log-background covariance. The re- 

sponse R ^ R' has to be extended to map also the back- 
ground space into the data space. Whether the response 
images of signal and background events in data space are 
well separated or whether they overlap decides about the 
background discriminating power of the instrument. 

The combined map and covariance of signal and log- 
background can now be obtained, e.g. from (j59l) or 
(|60p with the appropriate replacements for S, R,m, D 
S',R',m',D'. Our joint map can be split into a sig- 
nal and log-background part m! = {s,f). Since we 



1. Effective theory 

The reconstruction of the signal in the Poisson log- 
normal model in the previous section assumed that the 
signal covariance is known a priori. In case it is unknown, 
it has to be extracted from the same data used for the 
signal inference [isUj?! ]. However, the optimal way to do 
this was usually not derived from first principles, maybe 
except in [48l - l5g |. A rigorous approach to such prob- 
lems is given by the computationally expensive Gibbs- 
sampling technique, which investigates the joint space of 
signal realizations and power spectra |5ll454| . which can 
then easily be marginalized over the power spectra to 
obtain a generic signal reconstruction. This problem was 
also addressed approximatively for the case of linear re- 
sponse data from a Gaussian signal subject to Gaussian 
noise using the MAP principle as well as by the help of 
parameter uncertainty renormalized estimation by [ll|. 
We re-address this problem here using the minimal free 
energy approach. 

We assume the covariance S = (ss^)^^^ of our Gaus- 
sian signal s to be diagonal within some known function 
basis Okx, e.g. the Fourier basis with Okx ~ e^^^. We 
model the power spectrum (in this basis) as being a lin- 
ear combination of a number of positive basis functions 
fi{k) with disjoint supports (the spectral bands), so that 

P,(A:) = ^p,/,(A:) (63) 

i 

is positive for all k (all coefficients of p = {pi)i are positive 
and the spectral bands cover the full A;-space domain). 
We define 

{.Si)xy = {O^hOlxy = 0^h{k) Oky (64) 

to be the «-th spectral band matrix and S^^ to be its 
pseudo-inverse. Thus, we write our signal covariance as 

S = '^piSi, (65) 
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with p = (pi) the vector of unknown spectral parame- 
ters. We further assume that the individual signal-band 
amplitudes pi have an independent prior distribution, 



(66) 



with the individual priors being inverse-gamma distribu- 
tions, power-laws with exponential low amplitude cutoff 
at : 



Here we have introduced a parameter S to be fixed soon. 
The first two expansion coefficients are 

11,1 = \il-S)TT{DSr') 



11,2 = II^i +TT{{mm^ + -D] S^'D S^' ) . (72) 



3. Zeroth order solution 



V{p, 



1 



qiT{ai - 1) \q. 



Pi 



exp 



Pi 



(67) 



For tti ^ 1 this is an informative prior, where qi/ai 
determines the preferred value. A non-informative prior 
would be given by Jeffreys prior with = 1 and qi = 0.^ 
For a linear data model 



d = R s + n, 



(68) 



with Gaussian noise with covariance N = (nv)^i^^y the 
parameter marginalized effective Hamiltonian is accord- 
ing to [ll| 

H{d, s)'^]^ sHi s - jls + log U + i s^Srh 

i ^ 

(69) 

Here M = R'fN-^R, j = R'^N-^d, 7^ = a, - 1 -I- ft/2, 
and Qi = Tr[5j^^S'i] the number of spectral degrees of 
freedom within the band i. 



2. Free energy expansion 

The internal energy of a Gaussian posterior-ansatz is 
then 

U{m,D) = -m^Mm+ -Tt{DAI) 



(70) 



{s\rn,D) 



Again we have to deal with a Gaussian average over a 
logarithm, which we expand as 



I. = log(gO - ( (q^ + I s^Srh - q. 



k=l 



k{q, 



To zeroth order we find by minimizing the free energy 
while ignoring the Il-corrections 



m = D'j, D = TD', and 



(73) 



This means that the map is the Wiener filtered data, 
where the spectral coefficients are assumed to be 



1 



1 



p^ = ^^^[q^ + ^Tr{{mm^ + 5D)Sr^)]. (74) 

For 6 — this yields pi = 00 and therefore D — M^^ 
if M is (pseudo)-invertible. The resulting filter provides 
a noise weighted deconvolution, however is unable to ex- 
trapolate into unobserved regions of the signal space. It 
is widely used for map making in the field of cosmic mi- 
crowave background observations. For (5 = 1 we recover 
the critical estimator of [ll|. Since there it was shown 
that the latter performs significantly better than the for- 
mer, and also since Ilii =0 and IIi2 is minimal for 6=1, 
we adopt this in the following. For Jeffreys prior wc find 



Pt 



Tr(B,0 



(75) 



with B, = {mm'f + 15)5'," ^ 



4- Second order correction 

Including higher order corrections should improve the 
reconstruction. The first order corrections vanish for 6 = 
1. The second order correction yields 



D'j, D = T 



l{s\m,D) 



D' = 



with qi = qi + — Tr((TOm^ + 6 D)S^ 



(71) 



^ qt 



(76) 



^ Since this would result in an improperly normalized prior, we 
understand this as = 1 -|- e, = e, and limE_>o at the end of 
the calculation. 



Xi = l + -^Trf (mmt + i^i^QriDS-ri ) -Isriz). 
9? V 2 ' ' 'J q. 

The operator D' , which is applied to j to generate the 
map, and the uncertainty dispersion D are not identi- 
cal any more. Neither of them can still be expressed as 
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(M + ^-p- ^S^ ^) ^, due to the operator structure of the 
S~^D and S~^mm) terms. This was also found in (T]| . 

However, if we can assume that this operator processes 
any channel in the «-th band in a similar way, we can re- 
place S~^D and S~^m m^S~^ by their channel aver- 
aged values Tr(D5'-^) S'^/gi and Tr(mmt5'-^) 5'-Vft, 
respectively. This permits to identify spectral coeffi- 
cients of £> = (M + J2,Pi^S-^)~^ and D' = (M + 
J2iPi^^ '^r^)^^ ■ Jeffreys prior they become 



Tr(gO 
Tr(B,) 



2 /Tr(TOmtS'ri)' 
ft 1 Tr(i?,) 



-1 

and 



^ ^ 2 Tr(mmt5ri) Tr (ZJgri) 



(Tr(A:))^ 



-1 
(77) 



where m, D, and i?i = (mm''' -|- D)S~^ all depend on p. 
It is obvious, that the second order correction increases 
Pi by some margin compared to ([75]) . meaning that the 
reconstruction uncertainty increases. It is less obvious 
how p'^ develops, since at first glance it seems to be cor- 
rected downwards. Note however, that an increased pi 
implies an increased Tr(_Bi), since D grows (spectrally) 
with increasing pi . 

The fact that we get two differing sets of spectral coeffi- 
cients. Pi and p[, reminds us to regard them as auxiliary 
variables of our signal reconstruction algorithm, rather 
than as optimal spectrum estimates. 



C. Poisson log-normal distribution with unknown 
spectrum 

The combined problem, reconstructing a Poisson log- 
normal signal with unknown spectrum, can now be 
treated approximatively. The combined free energy for 
the Gaussian posterior approximation to zeroth order is 



G(m, D) [^16^™+"^ ^ - log (i^le" 

■i 

+ ^ 7z log (q. + i Tr ((m m) + D) S^^) 



|Tr(l + log(27rZ?)) 



(78) 



The resulting map and uncertainty dispersion are pro- 
vided by ((59)) with the addition that S = J2i Pi ^^'^ 
the piS are provided by ([Ti]). Higher order corrections can 
be included in a similar way as in the individual prob- 
lems. Also background counts with known or unknown 
covariance structure can be included in the same way 
they were treated in Sect. IIII A51 



IV. INFORMATION SYNTHESIS 
A. Multi-temperature posterior 

Although the obtained Gaussian knowledge states from 
minimal free energy estimation are approximative and 
therefore of limited accuracy, they might permit us to 
construct more accurate models of the posterior. The 
idea is to combine several Gaussian distributions to a 
more accurate approximation of the true non-Gaussian 
posterior probability, and to measure the mean map and 
its uncertainty dispersion from this combination. 

We recall that Gaussian approximations of the poste- 
rior obtained at low temperatures (T <^ 1) mostly carry 
information on its peak region, while those obtained at 
large temperatures (T ^ 1) information on its asymp- 
totics. Also the canonical T = 1 does not provide a 
perfect representation of the posterior, as a Gaussian ap- 
proximation for a non-Gaussian PDF never can. How- 
ever, by combining such different approximations in an 
appropriate way, we should obtain an improved repre- 
sentation of the correct PDF, which permits much easier 
calculation of moments like the signal mean and its un- 
certainty variance. 

To this end we postulate the existence of a temperature 
distribution function ViT), such that 



V{s\d) 



dTg{s~m^d,T),D^d.T))ViT) (79) 



combines the different Gaussians with means 



Hd,T) 



and 



dispersions D(^^ x) to synthesize the right posterior prob- 
ability. A formal proof of the existence of V{T), and the 
necessary conditions for this is beyond the scope of this 
work. It should be noted, that e.g. multi-peaked distri- 
butions cannot accurately be represented by approximate 
Gaussians obtained at different temperatures. They can, 
however, often be well approximated by Gausians cen- 
tered on those peaks. The recipes described below do 
not depend on the way the different Gaussians used in 
the mixture model were obtained, and therefore can also 
be used in such cases. 

In the following we provide a recipe to construct 'P(T) 
in practice. We assume that = m(jj7^.) and Di — 
D{d.Ti) have been computed for a number Nt of temper- 
atures Tj. The temperatures are best chosen to sample 
well the different part of the posterior, its peak by having 
some T, <^1, the bulk of the PDF with T, = 1, and the 
PDF tails with > 1. 

The surrogate probability function we want to con- 
struct, and which should resemble the exact one as closely 
as possible, is therefore of the form 



V{s\d) 



Nt 



,D,)P,,. 



(80) 



'P{s\d) should be as close as possible to V{s\d) in an infor- 
mation theoretical sense. The natural choice for the dis- 
tance measure is the KuUback-Leibler divergence, which 
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measures the cross-information of ^(sld) on ■p(sld), and 
which is practically identical to the free energy G['P(s|(i)] 
of our surrogate posterior according to (l34l) . Introducing 
un-normalized probabilities pi as our degrees of freedom, 
and setting Pi = Pi/Zp with Zp = pj in order to 
enforce the proper normalization, Pi ~ I, this reads 

G{p)=J2f{U,-UM)-F. (81) 

We have introduced the here irrelevant, since p- 
independent, free energy F — — log Zd of the original 
problem and the energies Ui and Ui{p) with respect to 
the template distributions Qi{s) — Q{s — mi,Di): 

{/, = {Hid,s)}g^^ Jvs g,{s)H{d,s) and 

U,{p) = {Hp{s))^ , with (82) 

Hpis) = -logiY^ p,g,{s)/Zp). 



This way, G{p) can be approximated, and minimized with 
a suitable optimization scheme. The sampling points, 
their Gaussian probabilities gI.^^ = Qk{s^{'''), as well as 
the energies Ui need only be calculated once, but the sur- 
rogate energies Ui{p) = logZp - J2j log(Efc PkGki)/Ni 
have to be updated at any step of the scheme. 

One might argue, that if we use stochastic methods 
to build 'P(sld), one could have used a Markov-Chain 
Monte-Carlo (MCMC) method right from the beginning 
for the signal inference problem. However, we expect that 
the here described posterior synthesis method should re- 
produce the correct posterior better than a sample point 
cloud, since we are using well adapted Gaussians as our 
building blocks and not delta functions as the direct 
MCMC approach uses. Furthermore, the analytical and 
sampling method can be combined, in that the analyti- 
cal estimates are combined with the sampling estimates 
of the contributions of the neglected terms in the Taylor- 
Frechet expansions of ([55]) . And finally, since our scheme 
draws samples from Gaussians, it can be trivially paral- 
lelized, which is not easily possible with MCMC schemes. 



B. Minimizing the Gibbs energy 

1. Analytical scheme 

Now one has to minimize G(p) with respect to p. The 
problem to calculate the path integrals defining the en- 
ergies was already addressed in this work. A system- 
atic way is to Taylor-Frechet expand the Hamiltonians 
around the centers of the Gaussians rrii and then use the 
known moments of Giis) to approximate the energies. 
For the surrogate energies this yields up to second order 
in (j)i = s — rrii 

U,{p) = -\ogg, + lj2 — Tr(^7' A) (83) 




with 

5j j = Pj G3{'m'i)/Zp, and 

gi = ^ gji, and (84) 

rriij — rrii — "nij ■ 

2. Monte-Carlo scheme 

Alternatively, one can approximate the average 
{X[s\)g, of a quantity X[s\ by sums over Ni sampling 

points {sf^}j, which can easily be drawn from Qi{s): 

(^W)a, «E^[^^'^]/A^.- (85) 

3 



C. Maps and moments 

Once the minimum of G{p) with respect to p is found, 
one has synthesized a posterior approximation with a 
Gaussian mixture model. From this, any moment of the 
distribution function can easily be calculated. The mean 
map can be expressed as 

^ ~ {s)p(s) Pi (*)e.(^) Pi^^, (86) 

i i 

as well as the uncertainty dispersion as 

Z) ss ((s - m) (s - ni)'^) p^^^ = ^ _P, (^Bi+m,, m\)-rmv) . 

(87) 

We leave the verification and application of the informa- 
tion synthesis method for future work. 

V. CONCLUSIONS 

We have shown that the minimal free Gibbs energy 
principle in information field theory can be used to ob- 
tain approximate knowledge states with maximal cross- 
information to the exact posterior. The construction of 
such knowledge states with Gaussian PDF is relatively 
straightforward: 

1. The joint PDF of signal and data V^djs) has to 
be specified, e.g. by specifying a data likelihood 
P{d\s) and signal prior V{s), and using V{d,s) = 
P{d\s)V{s). 

2. The information Hamiltonian H is the negative log- 
arithm of this, H{d, s) = — log{P{d, s)). 
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3. A suitably parametrized PDF as a surrogate for the 
posterior has to be specified, e.g. a Gaussian with 
its mean and dispersion as degrees of freedom. 

4. The internal energy U and entropy 5b of this 
PDF have to be calculated as the PDF-average of 
the Hamiltonian and the negative log-PDF, respec- 
tively. 

5. The Gibbs free energy, G = U — T Sb, has then to 
be minimized with respect to all degrees of freedom 
of the surrogate PDF. 

6. Any statistical summary like mean and variance 
can now be extracted from the surrogate PDF. 

The minimal free energy principle is therefore well suited 
to tackle statistical inference problems. We have demon- 
strated this with two different problems and their com- 
bination: reconstructing a log-normal field from Poisson 
data subject to a point spread function and reconstruc- 
tion without prior knowledge on the signal power spec- 
trum. Earlier results from renormalization calculations 
in Tlj have been reproduced. The there used renor- 
malization schemes can therefore be understood as aim- 
ing for a surrogate Gaussian PDF which has maximal 
cross information to the correct posterior. Since these 
results were previously shown to reconstruct well, also 



the here proposed method for the more complicated com- 
bined case can be expected to work. However, a detailed 
implementation and verification of this was left for future 
work. 

Finally we have sketched how Gaussian knowledge 
states obtained at different thermodynamical tempera- 
tures can be combined into a more accurate representa- 
tion of the posterior, from which moments of the signal 
uncertainty distributions can easily be extracted. 

The minimal Gibbs energy and maximal cross informa- 
tion principle introduced here to IFT should allow the 
construction of novel reconstruction schemes for statis- 
tical inference problems on spatially distributed signals. 
The thermodynamical language may help to clarify con- 
cepts and to simplify applications of IFT, since it permits 
us to tackle non-linear inverse problems without the need 
to use diagrammatic perturbation theory and renormal- 
ization schemes. 
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